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In  a classification  rule  for  deciding  between  two  pos- 
sible classes,  generally  a single  threshold  test  is  used.  If, 
however,  one  or  both  of  the  class  probability  density  functions 
for  the  decision  variable  is  multimodal  or  if  the  class  variances 
are  unequal,  the  situation  may  arise  where  it  becomes  desirable 
to  use  multiple  thresholds  to  bracket  several  regions  of  the 
decision  variable  assigned  to  the  two  classes.  An  easy  count  of 
the  number  of  inflection  points  in  the  operating  characteristic 
curve  generated  from  the  single  threshold  case  permits  determination 
of  the  maximum  number  of  thresholds  that  should  be  used. 
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I.  INTRODUCTION 

In  the  discrimination  problem  where  one  wants  to  decide 
whether  the  scalar  quantity  x comes  from  class  1 or  class  2,  the 
optimal  decision  rule,  using  either  the  Bayes  or  Neyman-Pearson 
criterion,  is  a threshold  test  on  the  likelihood  ratio:  if 
A(x)  < T decide  class  1,  otherwise  decide  class  2,  where 
A(x)  ^ P2(x)/Pj(x)  and  Pj^(x)  is  the  probability  density  function 
of  X given  class  i,  i - 1,2.  In  many  problems  A is  a monotonic 
function  of  x so  that  the  decision  rule  A > T is  equivalent  to 
the  threshold  test  x < x^  where  x^  is  the  divide  point  on  the  x 
axis  that  separates  the  axis  into  class  1 and  class  2 regions, 
i.e.,  A(x^)  - T.  If,  however,  A(x)  is  not  a monotonic  function 
of  X,  as  can  happen  in  the  case  of  multimodal  distribution  densities 
or  when  one  density  function  is  sufficiently  wide  that  it  extends 
beyond  both  sides  of  the  other  class  density  function,  then  the 
decision  rule  A(x)  > T can  give  rise  to  several  divide  points  on 
the  X axis  separating  alternate  class  1 and  class  2 regions.  The 
number  of  x axis  divide  points  is  the  number  of  roots  of  the 
equation  A(x)  - T.  One  can  either  find  which  region  the  observed 
value  of  X falls  in  or  one  can  check  A(x)  > T;  the  two  approaches 
are  identical. 

The  problem  is  that  often  one  does  not  know  Pj^(x)  or 
P2(x)  and  therefore  also  A(x).  In  the  absence  of  such  knowledge, 
the  usual  method  of  proceeding  is  to  use  the  simplest  decision 
rule,  if  X < x^  decide  class  1,  otherwise  decide  class  2,  where 
Xq  is  a single  divide  point  on  the  axis.  Of  course  this  is 
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equivalent  to  assuming  the  likelihood  ratio  is  monotonic,  which  may 
or  may  not  actually  be  the  case.  The  next  useful  step  is  to  plot 
the  two  kinds  of  decision  error  rates  a vs  B (normalized  between 
0 and  1)  in  the  form  of  an  operating  characteristic  (OC)  curve, 
where  the  free  running  parameter  is  the  divide  point  x^.  Here 
a is  the  leakage  rate  (deciding  class  1 when  class  2 is  correct)  . > 

and  @ is  the  false  alarm  rate  (deciding  class  2 when  class  1 

« - 

is  correct).  If  the  optimal  decision  rule  would  have  been  to 

use  several  divide  points  (A(x)  non-monotonic),  then  the  OC  curve  ^ 

obtained  by  using  just  one  divide  point  will  have  various  twists  ■ 

! ; 

and  bends  that  would  not  otherwise  be  there. 

We  wish  to  use  the  information  contained  in  the  twists  i < 

and  bends  to  work  backwards  (still  in  the  absence  of  any  knowledge 
of  A(x)  or  the  class  density  distributions)  to  know  how  many 
thresholds  on  x,  i.e.,  how  many  divide  points,  should  be  used  in 
redesigning  the  decision  rule  on  x if  more  than  one  threshold  is 
appropriate.  In  the  analysis  that  follows,  we  will  get  Instead 
a weaker  piece  of  information,  namely  the  upper  bound  on  the 

number  of  thresholds,  based  on  the  observed  number  of  inflection  r 

i:  ; 

points  in  the  OC  curve.  The  actual  number  of  thresholds  that  t 

should  be  used  depends  on  such  things  as  the  a priori  probabilities  ’ I 

of  occurrence  for  the  two  classes  and  the  cost  functions,  all  of  i 

’ ! 

which  are  ignored  here  or  assumed  unknown. 

If  one  is  suspicious  from  looking  at  the  OC  curve  that  , 

i 

A(x)  is  non-monotonic,  then  one  can  investigate  the  matter  more 

I ^ 

thoroughly  by  estimating  P]^(x)  and  P2(x)  by  constructing  histograms 
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or  Parzen  density  estimates  from  the  given  data.  Indeed,  one  could 
proceed  from  the  very  beginning  in  every  case  by  estimating  the 
class  density  functions  and  thereby  gain  knowledge  of  how  many 
thresholds  on  x to  use,  but  often  that  effort  is  wasted  if  it 
turns  out  that  in  fact  one  threshold  on  x was  optimal.  Also, 
estimating  continuous  distribution  densities  is  not  a controversy 
free  procedure;  the  choice  of  bin  size  and  smoothing  kernel  can 
critically  affect  the  results,  and  the  question  of  convergence  to 
the  true  underlying  density  function  is  not  always  assured.  The 
suggestion  advocated  in  this  note  is  to  go  ahead  and  use  one 
threshold  on  x,  get  the  OC  curve,  and  if  examination  of  the  curve 
shows  that  more  than  one  threshold  might  be  optimal,  then  construct 
estimates  of  p^Cx),  P2(x),  and  A(x). 

The  next  section  contains  a short  catalog  of  example 
density  functions  and  the  corresponding  schematic  OC  curves.  Section 
III  contains  conclusions  that  have  been  illustrated  in  the  catalog 
and  proofs  of  those  conclusions.  Section  IV  concludes  with  some 
worked  examples  based  on  Gaussian  densities. 

II.  A SHORT  CATALOG  OF  SINGLE  THRESHOLD  OC  CURVES  FOR  VARIOUS 
TYPES  OF  CLASS  DISTRIBUTIONS 

The  following  single  threshold  OC  curves  are  sketched 
on  linear  scales  for  the  error  rates  a and  3.  The  catalog  does 
not  begin  to  exhaust  all  the  possibilities  for  the  class  dis- 
tributions (probability  density  functions)  but  it  should  be 
sufficiently  complete  to  allow  the  basic  ideas  to  become  apparent 
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and  to  permit  stating  some  general  theorems.  The  OC  curves  are 
sketched  based  on  using  a single  (variable)  threshold  and  the 
arbitrary  convention  chosen  has  been  to  call  class  1 the  class 
with  the  smaller  mean  value  and  decide  class  1 if  x < and 
class  2 otherwise.  Scales  are  suppressed  on  all  but  the  first 
OC  curve.  The  inflection  points  are  marked  with  id's,  and  will 
be  featured  in  a theorem  later.  Example  (desired)  decision  thres- 
holds on  X are  marked  on  the  distribution  sketches.  Modes  that 
occur  in  the  wings  of  the  distributions  either  do  or  do  not  affect 
the  decision  rule  depending  on  whether  they  call  for  the  addition 
of  another  threshold.  Those  which  do  not  are  not  important  and 
are  ignored  after  the  first  catalog  entry. 


i 
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III.  CONCLUSIONS  AND  PROOFS 


From  the  OC  curve  illustrations,  all  sketched  by  using 
a single  (variable)  threshold  on  x and  linear  error  rate  scales, 
the  following  conclusions  can  be  made: 

1.  The  single  threshold  is  optimal  if  there  is  no 
inflection  point  on  the  OC  curve. 

2.  More  generally,  the  upper  bound  on  the  number 
of  thresholds  that  should  be  used  is  one  more  than  the 
number  of  inflection  points  on  the  OC  curve. 

3.  If  there  is  one  inflection  point  or  more  on  the 
OC  curve,  then  the  likelihood  ratio,  A(x)  ^ P2(x)/Pi(x)» 
is  either  (i)  not  a monotonic  function  of  x,  or  (ii) 
there  is  a horizontal  inflection  point  in  A(x),  i.e., 

A'(Xj)  - A"(Xj^)  ■ 0 for  some  Xj^.  (The  occurrence  of  (ii) 
is  expected  to  be  very  rare.) 

4.  The  two  class  distributions  can  be  considered 
identical  if  the  OC  curve  passes  sufficiently  near  to  tne 
equal  error  point  (.5,  .5),  and  the  curve  is  sufficiently 
close  to  a 45°  angle  straight  line. 

Proofs 

Since  1.  above  is  a special  case  of  2. , it  is  sufficient 
to  prove  that  2.  is  true.  One  can  begin  by  obtaining  the 
condition  for  an  inflection  point  in  the  OC  curve.  The  detection 
rate  is  Pq(Xq)  - 1 - o(Xjj)  - 1 - J[^o  false  alarm 

rate  is  0(x^)  P^Cx)  dx.  The  slope  of  the  OC  curve  is 
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do/dx^ 

Jg/dx" 


P2(’‘o) 


- A(x^),  having 


used  Leibniz's  rule  for  differentiating  the  definite  integrals. 
Differentiating  again. 


dA/dx^ 

dg/dx^ 


1 /dpj  P2  dpj 

Pi  ^o. 


2 2 

The  condition  for  an  inflection  point,  d P^/de  - 0,  can  be 
written  as  P2(x)/p^  ■ p^(x)/p2,  where  the  dummy  argument  x^  has 
been  changed  back  to  x,  and  primes  denote  differentiation  with 
respect  to  the  argument.  The  condition  can  be  rewritten  as 


In  A(x) 


The  number  of  inflection  points  in  the  OC  curve  is  the  number  of 
roots  of  Eq.  (1).  Since  In  A is  a monotonic  function  of  A,  the 
number  of  roots  and  the  value  of  the  roots  of  Eq.  (1)  are  identical 
to  those  of  the  equation 


A'(x) 
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Thus  the  number  of  inflection  points  in  the  OC  curve  is  the  number 
of  places  the  slope  of  A(x)  is  zero.  Generally  that  is  the  number 
of  extrema  of  a non-monotonic  A(x)  but  the  rare  occurrence  of  an 
inflection  point  of  A(x)  having  zero  slope  can  also  contribute 
an  OC  curve  inflection  point.  Conclusion  #3  is  therefore 

proved.  By  knowing  the  values  of  x at  the  OC  curve  inflection 

^ * 

points,  one  then  knows  the  values  of  x for  which  the  likelihood 
ratio  has  zero  slope,  i.e.,  one  knows  something  about  A(x)  without 
having  had  to  estimate  the  class  density  functions. 

The  optimal  number  of  thresholds  on  x is  the  number  of 
real  roots  of  A(x)  ■ T.  At  most,  this  number  of  roots,  t,  is 
one  greater  than  the  number  of  local  extrema,  N,  in  A(x).  The 
argument  is  topological  and  is  illustrated  for  a case  of  N ■ 4 
in  Fig.  1 for  several  different  values  of  T. 


1 root) 

3 roots) 

S roots) 


I 

i 
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In  the  illustration  since  there  are  4 extrema  of  A(x),  the  maximum 


number  of  roots  of  A(x)  ■ T is  5,  as  in  the  case  of  Tj^.  The 
general  case  is  t £ N 1,  and  since  the  upper  bound  on  N is  the 
number  of  roots  of  A*(x)  - 0,  it  follows  that  the  upper  bound  on 
the  optimal  number  of  thresholds  on  x is  one  greater  than  the 
number  of  inflection  points  on  the  OC  curve,  proving  conclusions 
#1  and  #2. 

Finally,  in  conclusion  #4,  if  Pj^(x)  ■ P2(x)  for  all 
then  a ■ 1 - e and  the  resulting  OC  curve  is  a 45°  straight  line, 
a well  known  result. 

IV.  EXAMPLES  USING  GAUSSIAN  DENSITY  FUNCTIONS 

To  conclude  this  note,  two  examples  are  offered, 
illustrating  the  relation  between  the  number  of  OC  curve  inflection 
points  and  the  optimal  number  of  divide  points  on  the  x axis,  i.e., 
the  number  of  thresholds  on  x. 

Example  1 Gaussians  with  the  same  standard  deviation  o^  ■ 02  ■ 1 
and  different  means,  m^  ■ "^2  * ^ 

_x^  (x-m) ^ 

^ /2¥  ^ /T? 

mx  - m^ 

j- 

The  likelihood  ratio  is  A(x)  ■ P2^Pl  " ^ number 

of  inflection  points  on  the  OC  curve  is  given  by  the  number  of 
roots  of  A*(x)  • 0,  or  more  conveniently  here  by  Eq.  (1), 
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[In  A(x)]'  <■  0.  In  this  case  it  is  the  equation  m - 0,  an 
equation  which  contradicts  the  initial  assumption,  m / 0,  and 
therefore  the  equation  is  not  satisfied  for  any  x,  i.e.,  it  has 
no  roots  and  therefore  the  OC  curve  has  no  inflection  point. 
Therefore  we  should  expect  one  threshold  to  be  optimal,  by 
conclusion  #1.  That  such  is  the  case  is  found  by  noting  A(x) 
is  a monotonic  function  of  x and  therefore  A(x)  ■ T has  one  root, 
implying  that  one  threshold  on  x is,  in  fact,  optimal. 

Example  2 Gaussians  with  the  same  mean,  m,  but  different  standard 
deviations,  Oj^  ■ 1,  O2  ■ o 1. 


I 


i 

< 


PzCx) 


/Itt  a 


* 1)  (x  - m)  ^ 

The  likelihood  ratio  is  A(x)  ■ 1 ^ o 

o ® 

Eq.  (1)  becomes  [In  A(x)]'  ■ ■ (-^  • 1)  (x  - m)  ■ 0,  which  has 

o^ 

one  root,  x - m,  so  that  there  is  one  inflection  point  on  the 

OC  curve.  Therefore  by  conclusion  #2  we  should  expect  the  optimal 

•> 

number  of  thresholds  on  x to  be  either  2 or  1.  The  optimal 
number  of  thresholds  is  the  number  of  roots  of  A(x)  T and 
since  A(x)  is  a Gaussian  function,  there  are  two  roots  for 
meaningful  T. 
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